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Exchange of Limits: Why Iterative Decoding Works 
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Abstract — We consider communication over binary-input 
memoryless output-symmetric channels using low-density parity- 
check codes and message-passing decoding. The asymptotic (in 
the length) performance of such a combination for a fixed number 
of iterations is given by density evolution. Letting the number of 
iterations tend to infinity we get the density evolution threshold, 
the largest channel parameter so that the bit error probability 
tends to zero as a function of the iterations. 

In practice we often work with short codes and perform a 
large number of iterations. It is therefore interesting to consider 
what happens if in the standard analysis we exchange the order 
in which the blocklength and the number of iterations diverge 
to infinity. In particular, we can ask whether both limits give the 
same threshold. 

Although empirical observations strongly suggest that the 
exchange of limits is valid for all channel parameters, we limit 
our discussion to channel parameters below the density evolu- 
tion threshold. Specifically, we show that under some suitable 
technical conditions the bit error probability vanishes below the 
density evolution threshold regardless of how the limit is taken. 

Index Terms — LDPC, sparse graph code, density evolution 



I. Introduction 



A. Motivation 



Consider transmission over a binary-input memoryless 
output-symmetric (BMS) channel using a low-density parity- 
check (LDPC) code and decoding via a message-passing (MP) 
algorithm. We refer the reader to [1] for an introduction to the 
standard notation and an overview of the known results. It is 
well known that, for good choices of the degree distribution 
and the MP decoder, one can achieve rates close to the capacity 
of the channel with low decoding complexity [2]. 

The standard analysis of iterative decoding systems assumes 
that the blocklength is large (tending to infinity) and that a 
fixed number of iterations is performed. As a consequence, 
when decoding a given bit, the output of the decoder only 
depends on a fixed-sized local neighborhood of this bit and this 
local neighborhood is tree-like. This local tree property implies 
that the messages arriving at nodes are conditionally inde- 
pendent, significantly simplifying the analysis. To determine 
the performance in this setting, we track the evolution of the 
message densities as a function of the iteration. This process 
is called density evolution (DE). Denote the bit probability of 
error of a code G after £ iterations by P,(G, e, £), where e is 
the channel parameter. Then DE computes 



lim E[P b (G, e,£)}. 



(1) 



If we now perform more and more iterations then we get a 
limiting performance corresponding to 



lim lim E[P b (G,e,£)}. 



(2) 
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In order for the computation graphs of depth £ to form a 
tree, the number of iterations can not exceed clog(n), where 
c is a constant that only depends on the degree distribution. 
(For a (l, r)-regular degree distribution pair a valid choice of 
c is c(l,r) = i og ( 1 _i)( r _ 1 ) ; [3].) In practice, this condition 
is rarely fulfilled: standard blocklengths measure only in the 
hundreds or thousands but the number of iterations that have 
been observed to be useful in practice can easily exceed one 
hundred. 

Consider therefore the situation where we fix the block- 
length but let the number of iterations tend to infinity. This 
means, we consider the limit 

lim E[P b (G,e,£)}. (3) 
Now take the blocklength to infinity, i.e., consider 

lim lim E[P b (G,e,£)\. (4) 

n— >oo t-^oo 

What can we say about <j4j and its relationship to (f2]i? 

Consider the belief propagation (BP) algorithm. It was 
shown by McEliece, Rodemich, and Cheng [4] that one can 
construct specific graphs and noise realizations so that the mes- 
sages on a specific edge either show a chaotic behavior (as a 
function of iteration) or converge to limit cycles. In particular, 
this means that the messages do not converge as a function 
of the iteration. For a fixed length and a discrete channel, the 
number of graphs and noise realizations is finite. Therefore, 
if for single graph and noise realization the messages do 
not converge as a function of £, then it is likely that also 
lim^oo E[Pb(G, e, £)] does not converge as a function of n 
(unless by some miracle the various non-converging parts can- 
cel). Let us therefore consider \imsup i _ tOQ E[P b (G,e,£)] and 
liminf^oo E[P&(G, e, £)]. What happens if we increase the 
blocklength and consider lim n _ (00 limsup^^ E[P&(G, e, £)] 
and lim„^ 00 liminf£_ >00 E[P fc (G, e,£))l 

We restrict our present study to the exchange of limits below 
the density threshold. I.e., suppose that the given combination 
(of the channel family and the MP decoder) has a threshold in 
the following sense: for the given channel family characterized 
by the real valued parameter e there exists a threshold e MP so 
that for all < e < e MP the DE limit © is 0, whereas for all 
e > e MP it is strictly positive. We will show that under suitable 
technical conditions the bit error probability also tends to zero 
if we exchange the limits. This implies that the DE threshold 
is a meaningful and robust design parameter. 

B. Summary of Main Result 

Consider transmission over a BMS channel parametrized by 
e, using an LDPC(n, 1, r) ensemble and decoding via an MP 
algorithm. Assume that the algorithm is symmetric in the sense 
of [ 1 ] [Definition 4.81, p. 209]. Moreover, assume that this 
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combination has a threshold and let e MP denote this threshold. 
If e < e MP then under the conditions stated in Sections [TT] and 
Ell 

lim limsupE[P b MP (G,M)] = 0. 

n^oo 

Instead of considering just an exchange of limits one can 
consider joint limits where the iteration is an arbitrary but 
increasing function of the blocklength, i.e., one can consider 
lim rwoo E[P b MP (G, e, l(n))]. Our arguments extend to this case 
and one can show that 

limsupE[P 6 MP (G,e,£(n))] = 0. 

n — »oc 

But for the sake of simplicity we restrict ourselves to the 
standard exchange of limits discussed above. In the same spirit, 
although some of the techniques and statements we discuss 
extend directly to the irregular case, in order to keep the 
exposition simple we restrict our discussion to the standard 
regular ensemble LDPC(n, 1, r). 

C. Outline 

We introduce two techniques that are useful in our context. 
First, we consider expanders. More precisely, in Section HI1 we 
show that for codes with sufficient expansion the exchange 
of limits is valid below the DE threshold. The advantage of 
using expansion is that the argument applies to a wide variety 
of decoders. On the negative side, the argument can only be 
applied to ensembles with large variable-node degrees. 

Why does expansion help in proving the desired result 
and why do we need large variable-node degrees? Assume 
that a sufficient number of iterations has been performed 
so that the number of still erroneous messages is relatively 
small. Consider further iterations. There are two reasons why 
a message emitted by a variable node can be bad. This can be 
due to the received value, or it can be due to a large number of 
bad incoming messages. If the degree of the variable node is 
large then the received value becomes less and less important 
(think of a node of degree 1000 and a decoder with a finite 
number of messages; in this case the received value has 
only a limited influence on the outgoing message and this 
message is mostly determined by the 999 incoming messages). 
If we ignore therefore the received message then we see that 
expansion helps since it can guarantee that only few nodes 
have many bad incoming messages; otherwise the set of nodes 
that has bad outgoing messages has too few neighbors in order 
for the graph to be an expander. 

If the variable nodes have small degree, then the received 
values play a significant role and can no longer be ignored. 
Therefore, for small degrees expansion arguments do not 
suffice by themselves. In Section [HI] we concentrate on the 
case 1 = 3. This is the smallest degree that is meaningful for 
all the decoders that we consider and so one can think of it 
as the most difficult general case. Except for the BEC, this 
case is not covered by a simple expansion argument and the 
techniques are more involved. 



II. Sufficient Conditions Based on Expansion 
Arguments 

Burshtein and Miller were the first to realize that expansion 
arguments can be applied not only to the flipping algorithm 
but also to show that certain MP algorithms have a fixed error 
correcting radius [5]. Although their results can be applied 
directly to our problem, we get stronger statements by using 
the expansion in a slightly different manner. 

A. Definitions and Review 

Definition 1 (Expansion): Let G be an element from 
LDPC(n,l,r). 

1) Left Expander: The graph G is an (l, r, a, 7) left expander 
if for every subset V of at most an variable nodes, the set of 
check nodes that are connected to V is at least 7|V|l. 

2) Right Expander: Let m = n-. The graph G is an (l, r, a, 7) 
right expander if for every subset C of at most am check 
nodes, the set of variable nodes that are connected to C is at 
least j\C\t. 
Why are we using expansion arguments in the context of 
standard LDPC ensembles? It is well known that such codes 
are good expanders with high probability [5]. 

Theorem 2 (Expansion of Random Graphs [5]): Let G be 
chosen uniformly at random from LDPC(n, 1, r). Let a max 
be the positive solution of the equation 

/19(a) li2{a"fT) — a7r/i2(l/7r) = 0. 

1 r 

Let X(l,i,a, 7) denote the set of graphs 

{G e LDPC(n, 1, r) : G € (l, r, a, 7) left expander}. 

If 7 < 1 — i then a max is strictly positive and for a < a max 

P{G e X(l, r, a, 7)} > 1 - 0(n-^ 1 -^~ 1 '>). (5) 

Let m = n|. We get the equivalent result for right expanders 
by exchanging the roles of 1 and r as well as n and m. 

As explained before, the idea is to show that the error 
probability goes to zero once the number of bad messages 
becomes smaller than a certain threshold. To make this more 
concrete we need a proper definition of "good" message 
subsets. 

Definition 3 (Good Message Subsets): For a fixed (Ir- 
regular ensemble and a fixed MP decoder with message 
alphabet M, let /3, < /3 < 1, be such that f3(l — 1) e N. 
A "good" pair of subsets of Ai of "strength" (3 is a pair of 
subsets (G V ,G C ) so that 

« if at least [3(1 — 1) of the (l — 1) incoming messages at 
a variable node belong to G v then the outgoing message 
on the remaining edge is in G c 

• if all the (r — 1) incoming messages at a check node 
belong to G c then the outgoing message on the remaining 
edge is in G v 

• if at least f3(l — 1) + 1 of all 1 incoming messages belong 
to G v , then the variable is decoded correctly 

We denote the probability of the bad message set Ai\G v after 
I iterations of DE by p^ d . 
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As we will see shortly, for many MP decoders of interest 
the sets G v and G c can be chosen to be equal. This is true 
for all those MP decoders where the outgoing reliability at a 
check node is equal to the least reliability of all the incoming 
messages (we call them min-sum-type decoders). Therefore, if 
all incoming messages are good (meaning they are correct and 
have sufficiently large reliability) then the outgoing message 
is correct and also has sufficiently large reliability. The BP 
decoder is an interesting case where G v ^ G c . For this 
decoder the reliability of the outgoing message at a check node 
is strictly smaller than the smallest reliability of all incoming 
messages. Therefore, we need to define the set G c to consist of 
messages of strictly higher reliability than the set of messages 
in G v . 

Definition 4 (Good Nodes): We call a variable or check 
node "good" if all of its outgoing messages are good. All 
other nodes are called "bad." 

Example 5 (BEC and BP): If at least 1 of the (l - 1) 
messages entering a variable node is known then the outgoing 
message is known and if at least 1 of the 1 messages entering 
a variable node is known then the variable itself is known. 
Further, if all of the (r — 1) incoming messages entering a 
check node are known then the outgoing message is known. 
We conclude that good is equivalent to known and that (3 = 

Th- 
As a second standard example we consider transmission 

over the BSC(e) and decoding via the so-called Gallager 
Algorithm B (GalB). 

Definition 6 (Gallager Algorithm B): Messages are ele- 
ments of {±1}- The initial messages from the variable nodes 
to the check nodes are the values received via the channel. 
The decoding process proceeds in iterations with the following 
processing rules: 

Check-Node Processing: At a check node the outgoing 
message along a particular edge is the product of the 
incoming messages along all the remaining edges. 
Variable-Node Processing: At a variable node the out- 
going message along a particular edge is equal to the 
majority vote on the set of other incoming messages and 
the received value. Ties are resolved randomly. 



Example 7 (BSC and GalB): Assume that the received 
value (via the channel) is incorrect. In this case at least 
— l)/2] + 1 of the (l — 1) incoming messages should 
be correct to ensure that the outgoing message is correct. If at 
least |~(l — l)/2] + 2 of the 1 incoming messages are correct 
then the variable is decoded correctly. (In fact, it is sufficient 
to have [(l — 1 ) / 2 J + 2 correct incoming messages to be able 
to decode correctly.) Therefore, good is equivalent to correct 
and p = [Mil , 



B. Expansion and Bit Error Probability 

Theorem 8 (Expansion and Bit Error Probability): 
Consider an LDPC(n, 1, r) ensemble, transmission over a 
BMS(e) channel, and a symmetric MP decoder. Let (3 be 
the strength of the good message subset. If /3 < 1 and if for 



(oc) 

some e, p y h J 



then 



lim limsupE LD pc(„,i. r) [ir(G,e,0] - 0. (6) 
Proof: Here is the idea of the proof: we first run the 
MP algorithm for a fixed number of iterations such that the 
bit error probability is sufficiently small, say p. If the length 
n is sufficiently large then we can use DE to gage the number 
of required iterations. Then, using the expansion properties of 
the graph, we show that the probability of error stays close 
to p for any number of further iterations. In particular, we 
show that the error probability never exceeds cp, where c is a 
constant, which only depends on the degree distribution and 
(3. Since p can be chosen arbitrarily small, the claim follows. 
Here is the fine print. Define 

1' 



1\ 1 + /3 P<i 
7=11")— < 



1 - 



(7) 



Let < a < a max (j), where a max (7) is the function defined 
in Theorem|2] Let p = "C 1- ^ 1 " 1 ) anc j ] e t £(j>) be the number 

of iterations such that p^d < p. Since pfjad = and p > 
this is possible. Let P S (G, E, t) denote the fraction of messages 
belonging to the bad set after I iterations. Let S7 denote the 
space of code and noise realizations. Let A C O denote the 
subset 



A = {(G, E) Cfi|P.(G,E,*(p)) < 2p}. 



(8) 



From (the Concentration) Theorem [39] we know that 

P{(G,E) ^ A} < 2e~ Knp2 (9) 

for some strictly positive constant K = K(l,r,p). In words, 
for most (sufficiently large) graphs and noise realizations the 
error probability after a fixed number of iterations behaves 
close to the asymptotic ensemble. We now show that once 
the error probability is sufficiently small it never increases 
substantially thereafter if the graph is an expander, regardless 
of how many iterations we still perform. 

Let Vo C [n] be the initial set of bad variable nodes. More 
precisely, Vo is the set of all variable nodes that are bad in 
the £(p)-th iteration. We claim that \V \ < ^m^ij n - ( This 
is because for a variable to send a bad message it must have 
at least 1 — (3(1 — 1) incoming bad messages.) As we just 
discussed, for most graphs and noise realizations this is the 
case. As a worst case we assume that all its outgoing edges 
are bad. Let the set of check nodes connected to Vq be Co. 
These are the only check nodes that potentially can send bad 
messages in the next iteration. Therefore, we call Co the initial 
set of bad check nodes. Clearly, 



\Co\ < l\Vo 



(10) 



Consider a variable node and a fixed edge e connected to it: 
the outgoing message along e is determined by the received 
value as well as by the (l — 1) incoming messages along the 
other (l — 1) edges. Recall that if (3(1 — 1) of those messages 
are good then the outgoing message along edge e is good. 
Therefore, if a variable node has (3(1 — 1) + 1 good incoming 
messages, then all outgoing messages are good. We conclude 
that for a variable node to be bad at least 1-/3(1 — 1) incoming 
messages must be bad. Therefore, it should connect to at least 
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1-/3(1 — 1) bad check nodes. This leaves at most (3(1 — 1) 
edges that are connected to new check nodes. 

We want to count the number of bad variables that are 
created in any of the future iterations. For convenience, once 
a variable becomes bad we will consider it to be bad for all 
future iterations. This implies that the set of bad variables is 
non-decreasing. 

Let us now bound the number of bad variable nodes by the 
following process. The process proceeds in discrete steps. At 
each step t, consider the set of variables that are not contained 
in Vt but that are connected to at least 1-/3(1 — 1) check 
nodes in C\ (the set of "bad" check nodes). If at time t no 
such variable exists stop the process. Otherwise, choose one 
such variable at random and add it to Vt- This gives us the set 
14+1. We also add all neighbors of this variable to Ct- This 
gives us the set Ct+i- By this we are adding the variable nodes 
that can potentially become bad and the check nodes that can 
potentially send bad messages to Vt and Ct respectively. As 
discussed above, for a good variable to become bad it must 
be connected to at least 1-/3(1 — 1) check nodes that are 
connected to bad variable nodes. Therefore, at most (3(1 — 1) 
new check nodes are added in each step. Hence, if the process 
continues then 



\v t+1 \ = ivii + i, 

\C t +i\ < \C t \+ (3(1-1) 



(11) 

(12) 



By assumption, the graph is an element of X(1,t, a, 7). 
Initially we have \V \ < ^m^T) n = n — an - 

Therefore, as long as \Vt \ < an, 



71 M < \c t \, 



(13) 



since Ct contains all neighbors of Vt- Let T denote the 
stopping time of the process, i.e., the smallest time at which 
no new variable can be added to V t . We will now show that 
the stopping time is finite. We have 

7 l(|Vo| +t)^ -yl\V t \ f \C t \ f \C \ + t/3(l - 1) 

< l\V \+t(3(l-l). 



Solving for t this gives us 



T < 



71-/3(1-1)' 



Therefore, 

\V T \ < 



|Vb|l(l-7) 
71-/3(1-1) 



iVol < 



2p 



71-/3(1-1) 



(14) 



where in the one before last step we used the fact that 
l^o I — 1- /(i-i) 71 " ^ e wn °l e derivation so far was based 
on the assumption that \V t \ < an for < t < T. But as 
we can see from the above equation, this condition is indeed 
verified (\Vt \ is non-decreasing and \Vt\ < an). 
Putting all these things together, we get 

E[PHW)] =E[Pr(G>E^)(l {(GjE)eA} + 1 {(g ,e)^})] 

<E[P b MP (G, E, *)l {(G>B)e A}] + P{(G, E) A} 



P{G ^ X(l, r, a, 7)} + P{(G, E) ^ A}. 

Apply limsup^^oQ on both sides of the inequality. According 
to (fT~4T > the first term is bounded by a. For the second term, 
since 7 < 1 — j, we know from Theorem [2] that it is 
upper bounded by C^n^W 1 " 7 '" 1 )). For the third term we 
know from (0 that it is bounded by 2e~ Knp for some 
strictly positive constant K = K(l,r,p). Therefore, if we 
subsequently apply the limit lim n ^oo then we get 

lira limsupE[P fc MP (G,e,^)] < a. 

Since this conclusion is valid for any < a < a max it follows 
that 

lim limsupE[P b MP (G,e,f)] = 0. 



Example 9 (BEC and BP): We know from Example that 
(3(1 — 1) = 1. If we apply the conditions of Theorem [8] we 
see that we require 1/(1 — 1) < 1. Hence, the exchange of the 
limits is valid for 1 > 3. Of course, for the BEC the exchange 
of limits in this regime follows directly by the monotonicity 
of the algorithm. 

Example 10 (BSC and GalB): We know from Example [7] 
that (3(1 - 1) = [(1 - l)/2] + 1. From Theorem[8]if e < e Gl " B , 
the limits can be exchanged if 1 — 1 > 1 + [(l — l)/2], i.e., 
for 1 > 5. 

The key to applying expansion arguments to decoders with a 
continuous alphabet is to ensure that the received values are no 
longer dominant once DE has reached small error probabilities. 
This can be achieved by ensuring that the input alphabet is 
smaller than the message alphabet. 

Definition 11 (Bounded MP Decoders): Given a MP de- 
coder whose message passing alphabet is unbounded, i.e., it is 
equal to K, we associate to it a bounded version. The bounded 
MP decoder with parameter M E M + , denote it by MP(M), is 
identical to the standard MP decoder except that the reliability 
of the messages emitted by the check nodes is bounded to M 
before the messages are forwarded to the variable nodes. 
Note that the outgoing messages from the check nodes lie 
in [-M, M] while the outgoing messages from the variable 
nodes can lie outside this range. 

Example 12 (MS(M), BP(M) Decoders): The MS(M) de- 
coder and the BP(M) decoder are identical to the standard 
min-sum (MS) and belief propagation (BP) decoder, except 
that the reliability of the messages emitted by the check nodes 
is bounded to M before the messages are forwarded to the 
variable nodes. 

Example 13 (MS (5) Decoder): Consider an (l > 5,r) en- 
semble and fix M = 5. Let the channel log-likelihoods belong 
to [—1,1]. It is easy to check that in this case we can choose 
G v = G c = [4, 5] and that it has strength (3 < |. Therefore, if 
the probability of outgoing messages from check nodes being 
in [4. 5] goes to 1 under DE. then according lo Theorem 8 the 
limits can be exchanged. 

For example, consider BSC(e) and LDPC(5, 6) ensemble. It 
is known for this channel and MS decoder the messages are 
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of the form k log for k G Z. Therefore we can restrict the 
message space to Z with the channel values mapped to {±1}. 
Now, if we consider MS(5) decoder, the messages belong to 
{—5, . . . , 5}. For this decoder, we can show that the limits can 
be exchanged till the DE threshold of 0.067. 
Example 14 (BP (10) Decoder): Let 1 = 5 and r = 6 
and fix M = 10. Let the channel log-likelihoods belong to 
[—3,3]. We claim that in this case the message subset pair 
G v = [9, 10], G c = [14, 43] is good with strength (3 = §. This 
can be seen as follows: If all the incoming messages to a check 
node belong to G c , then the outgoing message is at least 12.39, 
which is mapped down to 10. Suppose that at a variable node 
at least 3(= (3(1 — 1)) out of the 4 incoming messages belong 
to G v . In this case the reliability of the outgoing message is 
at least 14 = 3 x 9 — 10 — 3. The maximum reliability is 
43. Moreover, if all the incoming messages belong to G v then 
the variable is decoded correctly. Therefore if the probability 
of outgoing messages from check nodes being in [9,10] goes 
to 1 in the DE limit then from Theorem [8] the limits can be 
exchanged. 

For example, consider BSC(e) with channel log-likelihoods 
restricted between [—3,3]. For e < jt^s, the log-likelihoods 
lie outside [—3, 3] and hence they are mapped to {±3}. In 
this case the limits can be exchanged till the DE threshold of 
0.136. Note that this is what is done practice, since one has 
to work with bounded likelihoods. 

C. Expansion and Block Error Probability 

In the previous section we considered the bit error proba- 
bility. We will now derive sufficient conditions for the block 
error probability. Again we use expansion arguments but we 
proceed in a slightly different way. 

Theorem 15 (Expansion and Block Error Probability): 
Consider an LDPC(n, 1, r) ensemble, transmission over a 
BMS(e) channel, and a symmetric MP decoder. Let (3 be the 
strength of the good message subset. If /3 < j5t an d if f° r 
then 



(oo) 

some e, p y b J 



: {n , 1 , r) [P^(G 7 e,£)]=0. 



(15) 



lim limsupE LDPC ( 
n->oc __ 

Proof: As in Theorem [8] we first perform a fixed number 
of iterations to bring down the bit error probability below a 
desired level. We then use Theorem[36]to show that for a graph 
with sufficient expansion the MP algorithm decodes the whole 
block correctly once the bit error probability is sufficiently 
small. This is very much in the spirit of Burshtein and Miller 
[5]. 
Define 



1 - \ 



3 + (3 



Let < a < a max (7), where a max (7) is the function defined 
in Theorem H] Let p = a ( 1 ~^-~ 1 ^ and let £(p) be the number 
of iterations such that p^X < p. Let denote the space of 
code and noise realizations. Let P e (G, E, £) denote the fraction 
of messages belonging to the bad set after I iterations. Let 
A C n denote the subset 

A = {(G,E)C{l\P e (G,E,£(p))<2p}. 



From (the Concentration) Theorem [39] we know that 

P{(G,E) G" A} < 2e~ Knp2 (16) 

for some strictly positive constant K = K(l,z,p). 

Since /J^y^ < 2j — 1 we can apply Theorem [36] if 
G G X(l, r, a, 7) and if the initial number of bad messages is 
less than ^ then all the messages will become good after a 
sufficient number of iterations. 

Putting all these things together, we get 

E[P™(G,e,l)] =E[P-(G,E^)(l {(G>E)e ^ } + 1 { (g,e)^})] 
<E[P^ P (G, E, £)l mE)eA} ] + P{(G, E) g A} 

<E[P^ P (G, E, *)l {(G>B ) e A}l{Ge*(l,r,a l7 )}] + 
P{G X(l, r, a, 7)} + P{(G, E) ^ A}. 

Apply limsup^^ on both sides of the inequality. According 
to Theorem [36] the first term is 0. For the second term, since 
7 < 1 — i we know from Theorem [2] that it is upper bounded 
by 0(?i~ W 1-7 ) -1 )). For the third term we know from dT~6b 
that it is bounded by 2e~ Knp for some strictly positive 
constant K = K(l, r,p). Therefore, if we subsequently apply 
the limit linin^oo then we get 

lim limsupE[P^ p (G,e,^)] = 0. 

■ 

Example 16 (BEC and BP): According to Theorem [8] we 
require 1 > 4. Hence, if 1 > 4 then the block error probability 
tends to zero below the BP threshold. 

Example 17 (BSC and GalB): As explained in Example [7] 
for the Gallager B algorithm over BSC, (3(1 - 1) = 1 + f(l - 
l)/2] . The above condition implies if 1-2 > 1+ f(l - 1)/2], 

i.e., for 1 > 7 the block error probability goes to zero below 

e c » 1B . 

Example 18 (MS(h) Decoder): Consider an (l > 7, r) en- 
semble and fix M = 5. Let the channel log-likelihoods belong 
to [—1,1]. It is easy to check that in this case we can choose 
G v = G c = [4, 5] and that it has strength (3 < |. Therefore, if 
the probability of outgoing messages from check nodes being 
in [4, 5] goes to 1 under DE then according to Theorem [15] 
the block error probability tends to 0. 

Example 19 (BP(10) Decoder): Let 1 = 7 and r = 8 
and fix M = 10. Let the channel log-likelihoods belong to 
[—1,1]. We claim that in this case the message subset pair 
G v = [9, 10], G c = [15,59] is good with strength (3 = |. 
Therefore if the probability of outgoing messages from check 
nodes being in [9, 10] goes to 1 in the DE limit then from 
Theorem [15] the block error probability goes to zero. 

Theorem [8] has a stronger implication than Theorem [15] 
since it concerns the block error probability. Unfortunately, 
the required conditions are considerably more restrictive. We 
conjecture that in fact the conditions of Theorem [15] can 
be weakened by considering several stages of the algorithm 
jointly and that the required conditions are identical to the 
ones in Theorem IT31 

Conjecture 20 (Expansion and Block Error Probability): 
Consider an LDPC(n, 1, r) ensemble, transmission over a 
BMS(e) channel, and a symmetric MP decoder. Let (3 be 
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the strength of the good message subset. If /3 < 1 and if for 
some e, p^*^ = then 

lim limsupE LD p C („,i. r )[P^(G,e,£)] = 0. (17) 

III. Sufficient Condition Based on Birth-Death 
Process 

In the previous section we relied solely on the expansion 
of the graph to prove the validity of the limit exchange. As 
can be seen from the examples, for the decoders of interest 
the theorems are only valid for higher degrees, lets say 1 > 5. 
Practical codes however typically have small degrees. In these 
cases expansion itself is not sufficient. 

In more detail, the proofs in the previous section have two 
phases. In the first phase we run the MP algorithm for some 
fixed number of iterations to get the error probability down 
to a small constant. In the second phase we prove that the 
error probability stays close to regardless of how many 
further iterations we perform and assuming pessimistically 
that all variables nodes have bad received values. This is 
too pessimistic an assumption for small degrees, where the 
received value plays an important role. In this section, we 
develop a method which takes the actual channel realization 
into account. 

Consider a MP decoder operating on a message alphabet 
M. C K. Further, for /ieM, define \p\ to be the reliability 
of the message. This means that we define the reliability of 
— /z to be the same as the reliability of fi. 

Most of the MP algorithms used in practice like GalB, BP, 
and MS, fall in the following category of monotone decoders. 

Definition 21 (Monotone MP Decoders): We say that a 
symmetric MP decoder is monotone if the following con- 
ditions are fulfilled. At variable nodes the processing rules 
are monotone with respect to the natural order on A4; for a 
fixed received value, the outgoing message is a non-decreasing 
function of the incoming messages. 

At check nodes the processing rules are monotone with 
respect to the natural order on the reliabilities; the reliability 
of the outgoing message is a non-decreasing function of the 
reliabilities of the incoming messages. 
Monotonicity is a useful property and it is also quite natural. 
A remaining difficulty in analyzing these decoders is that at 
check nodes the monotonicity is with respect to the reliability 
and not the message itself. We will see shortly how to get 
around this problem. 

In what follows we mainly discuss the case of the GalB 
algorithm and 1 = 3. The generalization to degree 1 > 4 is 
straightforward and it is discussed in Section IIII-HI In this 
section we further give some examples of other monotone 
decoders to which the method can be extended. 

A. Main Result and Outline 

Lemma 22 (Exchange of Limits): Consider transmission 
over the BSC(e) using random elements from the (Ir- 
regular ensemble and decoding by the GalB algorithm. If 



e < e LGi,IB then 

lim limsupE[P h 0am (G,e^)] = 0, 

where e LGalB is the smallest parameter e for which a solution 
to the following fixed point equation exists in (0, e]. 

* = e E (Wa-y) 1 - 1 -* 

k=0 V " / 

fe=L|J+i v 7 

+ ^^-l^ eyi(1 _ y) l- 1+g(1 _ y) i (y) i- 1 ) J 

(18) 

where y = (1 — a;)* -1 . For the case of (1 = 3,r)-regular 
ensemble this equation simplifies to 

x = e(l - (1 - x)^ 1 ) 2 + e(l - (1 - x) 2(r ~ 1] ). 
Discussion: Note that the threshold e LG " 1B introduced in the 
preceding lemma is in general slightly smaller than the DE 
threshold e GlilB . We pose the extension of the result to channel 
values up to the DE threshold as an interesting open problem. 
It is likely to be difficult. 



r 


rate 




£ GalB 


e LGalB 


3 


0.0 


« 0.5 


S3 0.222 


S3 0.1705 


4 


0.25 


S3 0.2145 


S3 0.1068 


si 0.0847 


5 


0.4 


S3 0.1461 


w 0.06119 


S3 0.0506 


6 


0.5 


Rd 0.11002 


S3 0.0394 


« 0.0336 


7 


0.5714 


S3 0.08766 


w 0.02751 


si 0.02398 


8 


0.625 


S3 0.07245 


w 0.02027 


as 0.01795 


9 


0.667 


S3 0.06141 


S3 0.01554 


S3 0.01395 


10 


0.7 


S3 0.05324 


S3 0.01229 


S3 0.01115 



TABLE I 

Threshold values for some degree distributions with 1 = 3. 



r 


rate 


e Sto 


e GalB 


gLGalB 


4 


0.0 


S3 0.5 


S3 0.0840 


S3 0.0697 


5 


0.2 


S3 0.1461 


S3 0.0464 


S3 0.0399 


6 


0.333 


S3 0.11002 


S3 0.0292 


S3 0.0258 


7 


0.4286 


S3 0.08766 


S3 0.0200 


S3 0.018 


8 


0.5 


S3 0.07245 


S3 0.0146 


S3 0.0133 


9 


0.556 


S3 0.06141 


S3 0.0111 


S3 0.0102 


10 


0.6 


S3 0.05324 


S3 0.0087 


S3 0.0081 



TABLE II 

Threshold values for some degree distributions with 1 = 4. 

Example 23: Table J] shows thresholds for 1 = 3, r = 
3, • • ■ ,10. For the (l = 3, r = 6) degree distribution we have 
c umb ^ 0.0336. This is slightly smaller than, but comparable 
to, e GalB w 0.0394. 

We proceed by a sequence of simplifications, ensuring in 
each step that the modified algorithm is an upper bound 
on the original process. In Section IIII-BI we simplify the 
decoder by "linearizing" the processing rules at the check 
nodes. In Section ITlI-CI we further upper bound the process by 
considering the marking process associated with the decoding 
algorithm. In Section IIII-DI we construct a witness for the 
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marking process and derive bounds on the size of such a 
witness. In Section IIII-EI we then show that, conditioned on 
the witness, we can consider the channel realizations outside 
the witness to be random and independent of the witness. 
In Section IIII-FI we use an expansion argument to bound the 
stopping time of the birth and death process associated with 
the marking process. Finally, in Section [IlI-GI we combine all 
previous statements to derive at our conclusion. 

B. Linearized Gallager Algorithm B 

We proceed as in Section [II] Fix < e < e LGalB . We 
prove that for every a > there exists an n(a, e) so that 
limsup^ 00 E[P & GalB (G, e,£)] < a for n > n(a,e). 

Without loss of generality we can assume that the all-one 
codeword was sent. We will make this assumption throughout 
the remainder of this section. Therefore, the message 1 signi- 
fies in the sequel a correct message, whereas —1 implies that 
the message is incorrect. 

For this setting, we define the following linearized version 
of the decoder. 

Definition 24 (Linearized GalB): The linearized GalB de- 
coder, denoted by LGalB, is defined as follows: at the variable 
node the computation rule is same as that of the GalB decoder. 
At the check node the outgoing message is the minimum of 
the incoming messages. 

Discussion: The LGalB is not a practical decoding algorithm 
but rather a convenient device for analysis; it is understood 
that we assume that the all-one codeword was transmitted and 
that quantities like the error probability refer to the variables 
decoded as —1. By some abuse of notation, we nevertheless 
refer to it as a decoder. 

The LGalB decoder is monotone also with respect to the 
incoming messages at check nodes. Moreover, it satisfies the 
following property. 

Lemma 25 (LGalB is Upper Bound on GalB): For any 
graph G, any noise realization E, any starting set of "bad" 
edges, and any I, we have P G " IB (G, E, i) < P e L<MB (G, E, f), 
where P e (G,E,£) denotes the fraction of erroneous messages 
after £ iterations of decoding. 

Proof: Consider one iteration, i.e., a check-node step 
followed by a variable-node step. Let i g 0dlB / LGilB denote the set 
of bad edges (edges with message —1) after the ^-th iteration 
of GalB and LGalB, respectively. Let Tp^ m ^ LGA>B (B) denote the 
set of bad edges after one iteration assuming that the initial 
such set is B. 

We use the following two facts: (i) The outgoing messages 
for the LGalB decoder at variable/check nodes are monotone; 
if we decrease (with respect to the natural order on M.) 
the input at a variable/check node then the output is either 
decreased or stays the same. I.e., if B C B', meaning that the 
messages in B' can be obtained by decreasing some of the 
+1 messages in B to -1, then ipf " B (B) C ^ m (B'). (ii) For 
any set of input messages, the outgoing message of LGalB is 
less than or equal to the message of the GalB decoder, i.e., 
il>T*{B) C ijjf™{B). 

For the proof, we proceed by induction. Let Bq be the initial 
set of bad edges. After the first iteration, from (ii) we get 



B om = ^(S ) C ^ G " IB (2?o) = £i WB . To complete the proof 
it is sufficient to show that i3 GidB C Bf' AB implies B ^ C 
Using (i) and (ii) we have = ^f^(Bf'- m ) D 

^f^{Bf m ) D Ve" b (#£" 1b ) = &7+i and hence the lemma. ■ 
From the above lemma it suffices to prove the exchange of 
limits for the linearized algorithm. Note that e LGalB as defined 
in Lemma [22] is the threshold of the LGalB algorithm. We 
will prove that for every < e < e LG " IB and every a > there 
exists an n(a, e) so that limsup^^ E[P£ GalB (G, e, £)} < a for 
n > n(a,e). As we will see later, the monotonicity property 
of LGalB considerably simplifies the analysis. But the price 
paid for the simplification is that the technique works only for 
e < e LGl,IB , which is slightly smaller than the DE threshold. 

C. Marking Process 

Rather than analyzing the LGalB algorithm directly, we ana- 
lyze the associated marking process. This process is monotone 
as a function of the iterations. 

More precisely, we split the process into two phases: we 
start with LGalB for £(p) iterations to get the error probability 
below p; we then continue the marking process associated with 
an infinite number of further iterations of LGalB. This means 
that we mark any variable that is bad in at least one iteration 
(- > £{p)- Clearly, the union of all variables that are bad at 
at least one point in time £ > £{p) is an upper bound on the 
maximum number of variables that are bad at any specific 
instance in time. 

The standard schedule of the LGalB is parallel, i.e., all 
incoming messages (at either variable or check nodes) are 
processed at the same time. This is the natural schedule for 
an actual implementation. For the purpose of analysis it is 
convenient to consider an asynchronous schedule. 

Here is how the general asynchronous marking process 
proceeds. We are given a graph G and a noise realization E. 
We are also given a set of marked edges. These marked edges 
are directed, from variable node to check node. At the start 
of the process mark the variable nodes that are connected 
to the marked edges. Declare all other variables and edges 
as unmarked. Unmarked edges do not have a direction. The 
process proceeds in discrete steps. At each step we pick a 
marked edge and we perform the processing described below. 
We continue until no more marked edges are left. Here are 
the processing rules: 

If the marked edge e goes from variable to check: 

• Let c be the check node connected to e. Declare e to be 
unmarked but mark all other edges connected to c; orient 
these marked edges from check to variable; 

If the marked edge e goes from check to variable: 

• Let v be the connected variable node. If v has a good 
associated channel realization and v is unmarked then 
mark v and declare e to be unmarked. 

• Let v be the connected variable node. If v has an asso- 
ciated bad channel realization or if v has an associated 
good channel realization but is marked: (i) mark v and 
all its outgoing edges; (ii) orient the edges from variable 
to check; (iii) unmark e. 
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Let A4(G, E, S) denote the set of marked variables assuming 
that we start with the set of marked edges S and that we 
run the asynchronous marking process. Let M(G, E, S) = 
\M(G,E,S)\. As a special case, let M(G,E,£) denote the set 
of marked variables at the end of the process assuming that 
the initial set of marked edges is the set of bad edges after £ 
rounds of LGalB. As before, M(G,E,£) = \M(G,E,£)\. 

It is not hard to see that for any £' > £, P b LG " IB (G, e, I' ) < 
M(G,E,£)/n: for £' — £ both processes start with the same 
set of bad edges and both are operating on the same graph 
and noise realization. At the check-node side the processing 
rules are identical. At the variable-node side both processes 
also behave in the same way if they encounter a variable 
node with a bad channel realization. The difference lies in the 
behavior when they encounter a variable node with a good 
channel realization. In such a case the outgoing message for 
the LGalB is bad only if there are two bad messages entering 
at the same time instance. The asynchronous marking process 
algorithm declares the outgoing message to be bad if there 
are two incoming bad messages, even if the two messages 
might correspond to different time instances as measured by 
the parallel schedule. We conclude that for £' 6 N 

limsu P E[P b LGalB (G,e,£)] < — E[A/(G, E, £')]. (19) 
D. Witness 

It remains to bound E[M(G, E, £)]. Assume at first that we 
take a random graph G and a random noise realization E and 
that we start the marking process with a sufficiently small 
random set of marked edges (and not the set of bad edges 
after £ iterations of LGalB). In this case one can show that 
the number of marked nodes at the end of the process is 
with high probability not more than a constant multiple of 
the size of the starting set. To prove this statement, we use 
the fact that the graph, the noise, and the starting set of edges 
are all independent. Therefore, the marking process behaves 
essentially like a birth and death process: we pick an edge 
and we explore its neighborhood; with a certain probability the 
edge dies (if it enters a variable node with a correctly received 
value) and with a certain probability the edge spawns some 
children. As long as the expected number of new children is 
less than 1 the process eventually dies with probability 1. 

Unfortunately our situation is more involved. After £ it- 
erations the starting set of marked edges is correlated, both 
with the graph as well as with the noise realization. Our aim 
therefore is to reduce this correlated case to the uncorrelated 
case by a sequence of transformations. As a first step we show 
how to get rid of the correlation with respect to the noise 
realization. 

Consider a fixed graph G. Assume that we have performed 
£ iterations of LGalB. For each edge e that is bad in the ^-th 
iteration we construct a "witness." A witness for e is a subset 
of the computation tree of height £ (where height is counted as 
the number of variable node levels) for e consisting of paths 
that carried bad messages in the past iterations. We construct 
the witness recursively starting with e. Orient e from check 
node to variable node. At any point in time while constructing 



the witness associated with e we have a partial witness that is 
a tree with oriented edges. The initial such partial witness is 
e. One step in the construction consists of taking a leaf edge 
of the partial witness and to "grow it out" according to the 
following rules. 

If an edge enters a variable node that has an incorrect 
received value then add the smallest (according to some fixed 
but arbitrary order on the set of edges) edge that carries an 
incorrect incoming message to the witness and continue the 
process along this edge. The added edge is directed from 
variable node to check node. If an edge enters a variable 
node that has a correct received value then add both incoming 
edges to the witness and follow the process along both edges. 
(Note that in this case both of these edges must have carried 
bad messages.) Again, both of these edges are directed from 
variable to check node. If an edge enters a check node then 
choose the smallest incoming edge that carries an incorrect 
message and add it to the witness. Continue the process along 
this edge. The added edge is directed from check to variable 
node. Continue the process until depth £. Fig. Q] shows an 
example for 1 = 3, r = 4, and £ = 3. 




Fig. 1. Construction of the witness for a bad edge e. The dark variables 
represent channel errors. The part of the tree with dark edges represent the 
witness, the thick edges, including both dark and gray, represent the bad 
messages in the past iterations. The number h in the left indicates the height 
of the tree. 



Denote the union of all witnesses for all edges that are 
bad in the £-th iteration by W(G,E,£). We simply call it the 
witness. The witness is a part of the graph that on its own 
explains why the set of bad edges after £ iterations is bad. 

How large is W? The larger £, the fewer bad edges we 
expect to see in iteration £. On the other hand, the size of 
the witness for each bad edge grows as a function of £. The 
next lemma, whose proof can be found in Appendix iBl asserts 
that the first effect dominates and that the expected size of W 
converges to zero as the number of iterations increases. 

Lemma 26 (Size of Witness): Consider the (3, r)-regular 
ensemble. For < e < e LGalB , 

lim -E[\W(G,E,£)\] =o e (l). 

n — >oo fi 

Why do we construct a witness? It is intuitive that if we keep 
the witness fixed but randomize the structure as well as the 
received values on the remainder of the graph then the situation 
should only get worse: already the witness itself explains all 
the bad messages and hence any further bad channel values 
can only create more bad messages. In the next two sections 



9 



we show that under some suitable technical conditions this 
intuition is indeed correct. 

E. Randomization 

A witness W consists of two parts, (i) the graph structure 
of W and (ii) the channel realizations of the variables in W. 
We will often need to refer to either of these parts on their 
own. By some abuse of notation we write W also if we refer 
only to the graph structure or only to the channel realizations. 
The usage should be clear from the context. As an example, 
we write W C G to indicate that G contains W as a subgraph 
and we write W C E to indicate that the received values of 
all variables in W agree with the values that these variables 
take on in E. 

Fix a graph G and a witness W, W C G. Let £g,w denote 
the set of all error realizations E that give rise to W, i.e., 
W(G,E,£) = W. Clearly, for all E E £ Gj w we must have 
W C E. In words, on the set of variables fixed by the witness 
the errors are fixed by the witness itself. Therefore, the various 
E that create this witness differ only on G\W. As a convention, 
we define £ GiW = if W % G. 

Let £ G yy denote the set of projections of £g,w onto the 
variables in G\W. Let E' E £' Q w . Think of E' as an element 
of {0, 1}I G \ W I, where denotes a correct received value and 
1 denotes an incorrect received value. In this way, £ G w is a 
subset of {0,1}I G \ W L 

This is important: £ G w has structure. We claim that, if E' E 
£q w then £(. w also contains E^ (as defined in Appendix ID! . 
More precisely, if the noise realization E' E £' G w gives rise to 
the witness W then converting any incorrect received value in 
E' to a correct one will also give rise to W. This is true since 
the LGalB algorithm is monotone, so that taking away some 
incorrectly received values can not increase the size of bad 
edges observed in the £-th iteration. But on the other hand, 
W itself ensures that the set of bad edges after I iterations 
includes all the bad edges we saw originally. The proof of the 
following lemma relies heavily on this property. 

Lemma 27 {Channel Randomization): Fix G and let W C 
G. Let E E / [•] denote the expectation with respect to the channel 
realizations E' in G\W. Then 

E E ,[M(G,(W,E'),W)l {We£ ^ w} ] 

< E E ,[M(G,(W,E'),W)]M E ,[l {&e si iW }}- (20) 
Discussion: Lemma[27lhas the following important operational 
significance. If we divide both sides by Ee'[1{e'g£^ }], the 
left-hand side is the expectation of marked variables, where 
the expectation is computed over all those channel realizations 
that give rise to the given witness W, whereas the right-hand 
side gives the expectation over all channel realizations (outside 
the witness) regardless whether they give rise to W or not. 
Clearly, the right-hand side is much easier to compute, since 
the channel is now independent of W. The lemma states that, 
if we assume that the channel outside W is independently 
chosen then we get an upper bound on the size of the marked 
variables. 

Proof: Let n' = |G\W| . Let P{-} be the probability 
measure associated with E E <[-], i.e., P{E'} = e ni e n ~ ni , 



where ri\ denotes the number of ones in E'. Let /(E') denote 
the function l{E'e£^ }, and let g(E') denote the function 
M(G, (yV,E'),VV). Note that / is a decreasing function on 
{0, 1}™' because if f(E') = 1 then for all E" < E', /(E") = 1. 
Further, g is an increasing in {0, 1}™ since LGalB is monotone 
in the number of channel errors. Since ff(E') < n, n—g is non- 
negative and it is a decreasing function. For s, t € {0,1}" , 
let \s\ denote the number of Is in s and s V t and s A t be as 
defined in Appendix ID1 Then, 

P{s}P{t} = 6^+1*1 (er'-^H'l, 

P{s V t] — el s l + l*l _ l sA *l(e)"' _( l s l + l*l _ l sA *l), 

P{sAt} = el sA *l(e) n '-l sA *l. 

Therefore, P{s}P{t} = P{s Vt}P{sAt}. Applying the FKG 
inequality in the form of Lemma [37] to / and n — g, we get 

E[f(n-g)]>E[f]E[n-g]. 

This implies E[fg] < E[f]E[g}. ■ 
We can now upper bound the right-hand side of ( fT9] >. The 
proof of the next lemma can be found in Appendix ICl 

Lemma 28 (Markov Inequality): Consider the (l = 3, ir- 
regular ensemble and transmission over the BSC(e). Let (G, E) 
be chosen uniformly at random. Let £ G N and 8 > so that 
E[|W(G,E,£)|] < 6 2 n. Then 

E[M(G, E, £)] 

- 12 12 ¥{G}P{£ G , W }E E , [M(G, (W, E'), W)] + On. 

W:\W\<6n G 

F. Back to Expansion 

In the previous section we have shown that for a fixed graph 

G, and a given witness W, we can ignore the correlations 
between the witness and the channel values in G\W and 
consider those channel values to be chosen independently. But 
the graph structure of G\W is still correlated with W. Let us 
now deal with this correlation and get a bound on the marking 
process for those G that have an expansion close to the typical 
one of the ensemble. 

Consider the following random process, which we call the 
R-process. The process proceeds in discrete steps and has state 
(Ct, St,B t ,I t ) at time t, where each component is an integer. 
We initialize the process with (Co, So, B ,I ) = (0, So, 0, 0), 
where S E N. 

At each step we have two choices. We can either perform a 
regular step or a boundary step. The effect of each step type on 
the state (Ct, St, B t , It) is shown in Table Hill If we choose 
a regular step then, with probability e, an extension step is 
executed and, with probability e, a pruning step is performed. 
The choices of extension step versus pruning step are iid. 

In our choice of step type we are restricted by the following: 
at any time during the process the state has to satisfy 

jrC t < St + Bt + It, (21) 

where 7 = 1 — for some strictly positive number 5. Let 
T be the smallest time t so that St = 0. It is convenient to 
formally define the process for all t by setting U t — Ut for 
t > T. 
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Ct 


s t 


B t 


-ft 


regular extend 




Jr — o 


U 


1 




1 


r - 3 





i 







-3 





i 


regular prune 





-1 


i 





boundary 


1 


r - 2 


-l 


1 







-2 


-i 


1 



TABLE III 

Possible state transitions. Note that there are several 
possible transitions corresponding to a "regular extend" step 
as well as a "boundary" step. as explained below, the 
transitions indicated in bold letters dominate the other 
transitions in the sense of definition^ 



Discussion: Here is the interpretation of the above process. 
We are given a fixed graph G and a witness W. The channel 
realizations in G\W are generated independently with proba- 
bility of error e. We are interested in computing the expected 
number of marked variables E E / [M(G, (W, E'), W)]. 

The components of the state vector have the following 
interpretation. By some further abuse of notation, let W refer 
now also to the variables contained in W. Let M(W) denote 
all the check nodes that neighbor W. We start our process 
with those edges connected to 7V(W) that do not connect to 
W. The cardinality of this set is denoted by So (where the "s" 
stands for surviving). In each step we take a single edge from 
this set of surviving edges and "grow it out." 

Let us discuss this process in more detail. When we "grow 
out" an edge we first visit the connected variable node. 
Suppose that this is the first time that the process visits this 
variable node. We call this a regular step. 

If the received value of this variable node is good then we 
stop the process along this edge. We add the variable to the 
boundary set to make a mental note that we have seen this 
node exactly once. The boundary set has cardinality B t . We 
further subtract 1 from St to take into account that we finished 
processing one of the "surviving" edges. 

If the received value is bad then we add this variable node 
to the internal variable nodes. The cardinality of this set is 
I t . This means that in this step we increase I t by 1. Further, 
we expand the graph along the two outgoing edges, add the 
(at most) two connected check nodes to the set of internal 
check nodes (whose cardinality is denoted by Ct) and add all 
the remaining edges that emanate from these check nodes to 
the set of surviving edges. This adds (at most) 2(r — 1) new 
survivors, but we have to subtract the edge we started from. 
Therefore, St is increased by at most 2r — 3. 

So far we have assumed that we have not seen the variable 
node (that is connected to the edge which we grow out) before. 
Suppose now that, to the contrary, the variable is an element 
of the boundary. We know that in this case the received value 
is good, but we also know that the variable received another 
bad incoming message. Therefore, the variable will send a bad 
outgoing message along its remaining edge. Hence, we move 
this variable node from the boundary to the internal set (this 
decreases B t by 1 and increases I t by 1). Further, we grow out 
the graph along the only remaining outgoing edge. This adds 
at most one new check node and at most r — 1 outgoing edges 



to the set of surviving edges. Discounting again the edge we 
started with, we add in total at most r — 2 to St- 

Suppose that the graph G is a right expander; i.e., G G 
X(l, r, a, 7), where 7 > 1 — for some strictly positive 6. 
This means that every collection C of check nodes of size at 
most am has at least 7|C|r connected variable nodes. Consider 
the state of the system at some time t. At this point in time we 
have Ct check nodes. All these check nodes are "internal," i.e., 
all their neighboring variable nodes are either counted in Vt 
or It, or they are yet to be encountered by the process which 
cannot be more than the survivors set St. We know that G is 
an expander and suppose for now that Ct < am. Then we 
know that the number of connected variable neighbors must 
be at least 7rC t , i.e., at any time during the process the state 
should satisfy 

jrC t < S t + B t + I t . (22) 

We claim that 

irC t < St + Bt + It -(1-8) (23) 

is a necessary condition to be able to perform a boundary step 
at time t. To see this, suppose we take a boundary step. If 
you look at Table [III] you will see that there are two possible 
transitions. One can check that the transition stated in bold 
letters gives the less restrictive condition. Let us therefore only 
focus on this case. The state after applying the boundary state 
must still fulfill d22b . This means that we must have 

jr(C t + 1) < (St + r - 2) + (Bt - 1) + (h + 1). 

The claim is proved by rewriting this inequality. 

From the above discussion we claim that for a given W 
and G, where G e X(1,t, a, 7), as long as Ct < am 
then the marking process can be modeled as the R-process. 
The random variable 1^ is equal to the random variable 
M(G, (W, E'), W) - \W\ of the marking process (we subtract 
the size of witness because we do not include it in the internal 
variables). For the actual marking process the decision of 
whether a regular step or a boundary step is taken is forced 
by the structure of the graph and our choice of which edge to 
grow out. For the R-process the role of graph is taken by a 
strategy. A strategy is any (randomized) decision function F 
that, based on the initial state and past decisions and outcomes, 
decides whether a regular step or a boundary step is taken at 
any point in time. 

Here is the connection between the actual physical process 
and the R-process in more detail. Assume we are given a 
graph G and a witness W. We know the graph and therefore 
we also know which edges of the graph are elements of the 
surviving set. Therefore, when we pick a survivor, we know 
in advance whether the step is a regular step or a boundary 
step. The noise realization, which is not known to us a priori, 
determines whether a regular step is a regular extend or prune 
step. We see that each graph gives rise to a strategy. As long as 
the size of all revealed nodes is sufficiently small this strategy 
will be admissible since the expansion will be valid up to this 
point. 

Since we are only interested in an upper bound on the 
number of marked variables, we allow the R-process to use 
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an arbitrary strategy, only limited by the condition (l22l i. We 
call a strategy which obeys (l22l an admissible strategy. Since 
the actual physical process is also limited by (f22b (under the 
condition that the graph is an expander and the process has 
not grown beyond the size where the expansion is valid), it 
suffices to derive upper bounds on Ef/oo] that is valid for all 
choices of the strategy. 

We relax one further restriction imposed by the actual 
physical process in order to simplify our task. Again, this only 
increases E[/ oc ]. In the marking process, we can only perform 
a boundary step if the boundary set is strictly positive. In other 
words, we require B t > for a boundary step to be performed. 
We lift this restriction for the R-process. 

Definition 29 (Ordering of States): The state U = 
(C,S,B,I) dominates the state U' = (C',S',B',I'), 
denoted by U > U' if 

(i) S > S', 

(ii) / > I', 

(iii) S + B + I- 7 rC >S' + B' + I'- 7 rC*'. 



Lemma 30 (Monotonicity of 1^ with State): Consider the 
R-process with admissible strategy F' and initial state U' = 
(C',S',B',I'). Let U = (C,S,B,I) be an initial state 
which dominates U', i.e., U > U'. Then there exists an 
admissible strategy F so that E[/ 00 (J7,F)] > E[/ 00 (C/', F')], 
where Ioo(U, F) denotes I x assuming that the R-process is 
initialized with U and that the process uses the strategy F, 

Proof: Given U' and the admissible strategy F 1 we 
construct the admissible strategy F in the following way. The 
process with initial state U uses strategy F' but applies it to the 
pseudo state U'. Further, it updates its pseudo state according 
to the realization of the process and bases its future decisions 
on strategy F' applied to this evolving pseudo state. Call the 
phase of the process until the pseudo state has reached S' = 
the "initial" phase of the process. At that point the (U, F) 
process switches to any admissible strategy based on its real 
state. To be concrete, assume that it uses a greedy strategy at 
this point. This means that the process performs a boundary 
step any time it is admissible. 

In order to show the desired inequality on the expected 
values we couple the processes (U',F') and (U,F). We 
imagine that we run both processes in parallel and that they 
experience exactly the same randomness (this refers to the 
randomness contained in the choice of the transitions as well as 
any randomness which might be used by the strategy). Assume 
for the moment that strategy F is admissible. 

In the initial phase of the algorithm (until the (U',F') 
process stops because S' t = 0) the (U, F) process proceeds 
in lock-step with the (U',S') process. Since So > S and 
since St — Sq — S' t — S it follows that St > S' t in this initial 
phase. This means that the process (U, F) never stops before 
the process (U',F'). Further, I > I' , I t - I = I[ - I , 
and I t is a non-decreasing function. It follows that for every 
realization I^iJJ^F) > Ioo(U' , F'). This implies, a fortiori, 
the claimed inequality on the expected values. 

Let us now show that the protocol F is admissible. We 



claim that for all i e N 

S t +B t +It- jrCt >S' t + B' t + I' t - ~fiG' t . (24) 

By definition this is true for t = 0. But by construction of the 
coupling, S t ~ So = S' t - S Q , I t -h = I't - 4 Bt - So = 
B' t - B' , and C t -C = C[ - C' Q . It follows that the left-hand 
side in d24"l i is always at least as large as the right-hand side. 
Therefore, if F' is admissible then so is F. ■ 
From Table |lll] we see that for regular extend and boundary 
steps there are several possible outcomes. For each of these 
two steps, there is a single outcome (highlighted in the table) 
whose resulting state dominates those of the other outcomes. 
Since we are interested in an upper bound on 1^, thanks to the 
above lemma, we can restrict our attention to these dominating 
steps. 

Consider the greedy strategy, call if F 9 . For this greedy 
strategy, whenever d23l l is true we perform a boundary step. 

Lemma 31 (Domination of the Greedy Process): For a 
given initial state U — (Co, So, B , 1 ) and any admissible 
strategy F, we have 

![/«,([/, >£[!«, (IT,* 1 )]. 
Proof: Again we construct a coupling between the 
processes (U,F) and (U,F 9 ). As remarked above, for both 
processes we can assume that the state transitions are the ones 
indicated in bold in Table Hill The only randomness therefore 
resides in whether for a regular step the process extends or 
prunes and, possibly, in the randomness used for the strategy 
F. There is no randomness involved in any boundary steps. 
The coupling consists in coupling for each regular step i, 
i G N, the outcomes of these regular steps. In more detail, if 
for the process (U, F) the i-th regular step results in a pruning 
then the same occurs for the i-th regular step for the process 
(U, F 9 ). By construction, for all regular steps the change of 
S, I, B, and C is the same for both processes. Assume we 
measure "time" not in the absolute number of steps taken 
but by the number of regular steps taken. Consider a process 
(U, F) and assume that this process is still "alive" at 'time t. 
Then its state Ut only depends on the realization of the random 
variables during the regular steps and on the total number of 
boundary steps taken, but it does not depend on the order of 
the steps taken. 

Since the process (U, F 9 ) has by definition done at least as 
many boundary steps as the process (U, F) it further follows 
that if we compare the two processes at "time" i corresponding 
to i regular steps then the number of survivors (and also the 
number of internal nodes) for (U,F 9 ) is at least as large as the 
number of survivors for (U, F). Therefore, if at this time the 
process (U, F) is still alive then so is the process (U, F 9 ) and 
the latter has at least as many accumulated internal variable 
nodes as the former. This proves our claim. ■ 

Since we are interested in upper bounding Efioo], it is 
sufficient to bound E[7 00 (C7, F 9 )), which is done in the next 
lemma. We use large deviation properties of the sub-critical 
Galton- Watson process. For the convenience of the reader we 
provide this estimate in Appendix [E] 

Lemma 32 (Birth Death Process): Let the initial state be 
U = (0, Sq, 0, 0). Fix a strictly positive S, < 6 < 2 (r-i) 1 so 
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that 4=r G N and let 7 = 1 - i±£. For all e < „, 1 there 

25 1 r 2(r— 1) 

exist constants c = c(l, r, e, 5), c > 1, and c' = c'(l, r, e, (5) > 
so that 

P{Ioc(U,FV)>cS a }<e- c ' So . 

Proof: Since condition (l23l is satisfied in the beginning, 
the greedy R-process starts with some boundary steps. We 
claim that after exactly |_ y^jj such boundary steps the condi- 
tion d23l is for the first time no longer fulfilled. To see this, 
ignore the integer constraint for a moment. At the beginning 
of the process the condition (|23T > reads < So — (1 — S). After 

boundary steps this condition is transformed to 

7r^<So + r ^(r-2)-(l-J), 

which is equivalent to < —(1—5), We see that the inequality 
is no longer fulfilled and it is easy to check that this is the 
first time that it is no longer fulfilled. 

After the initial boundary steps, the greedy strategy per- 
forms regular steps until exactly regular extend steps are 
performed and then follows it by exactly one boundary step. 
This sequence is then repeated. (Note that by our assumption 

To see this, note that each regular extend step increases the 
right-hand side of d22l by 2 (r — 1) and the left-hand side by 
2(r — 1 — S). Further, each boundary step increases the left- 
hand side by r — 1 — S and the right-hand side by r — 2. Since 
^2(r-l-<5) + (r-l-5)= ^2(r-l) + (r-2), we see 
that after one such sequence of first regular extends steps 
followed by a boundary step the inequality is unchanged (up 
to an added constant). (A regular prune step does not change 
the condition ((22).) 

Since the randomness is contained only in the regular steps, 
we can model the process as consisting of only regular steps. 
To include the effect of boundary steps, we alter the outcome 
of the regular extend step as follows. From Table HIT1 note that 
for each regular extend step we increase S by 2r — 3 and / 
by 1. We include the effect of boundary step by changing this 
to an increment of 2r - 3 + (r - for S and 1 + ^zg 

for /, respectively. 

Now this process is a standard birth and death process. 
Recall that we have e < 2 (r-i) m ^ ^ < 2(r-i) ■ Hence, the 
expected increase in S at each step is e(2r — 3 + (r — 2)j^). 
This is strictly less than 1. As discussed in more detail in 
Appendix [E] this shows that, except for an exponentially small 
probability, this process stops for t < cSo for some appropriate 
constant c > 1. This proves our lemma since in each step we 
create at most 1 + -j-^ internal variables. ■ 

Using Lemma[32]we bound the number of variables marked 
by the marking process as follows. 

Lemma 33 (Upper Bound): Let 7 = 1 — for some 

< <5 < 2 (r-i) ■ Fix G and W such that W - G and G e 
X(l, r, a, 7). Let c = c(l, r, e, S) be the constant appearing 
in Lemma [32l If |W| < g^^^ ara then 

lim ^E E , [M(G, (W, E'), W)} < a-, 
n^oo n r 

Proof: Let m = ^n. The maximum number of surviving 
edges coming out of the witness W is 3(r — 1)|W|. Let this be 



So. Consider the R-process with initial state U = (0, S^, 0: 0) 
and the greedy strategy F 9 . From Lemma [32] there exists a 
strictly positive constant c' such that 

PUoo(E/,F 9 )>cS }<e- c ' Sn . 

The bound on |>V| in the hypothesis implies that cSq = 
c3(r — 1)|W| < %m. From Table Hill we see that any time 
the number of internal variable nodes is increased by 1 the 
number of check nodes increases by at most 2. Therefore, 
Ioo(U, F 9 ) < cSq implies that Coo < 2cSo < am. This shows 
that the expansion property is satisfied for the whole duration 
of the process. Hence, Ioo(U, F 9 ) is a valid upper bound for 
M(G, (W, E'), W). 

Let M(E') denote M(G, (W, E'), W). Since M(E') counts 
the initial |W| < ^m variables present in W along with the 
internal variables created, 

P{M(E') > am} < P{/ 00 (J7,F 9 ) > |m} < e - c ' So . 

Therefore, 

E E < [M{G,(W,E'),W)} 

< P{M(E') < am}am + P{M(E') > am}n 

1 , c'cln , 

< a— n + (1 — e 2r )n. 

The lemma is proved by taking the limit n — > cxd. ■ 

G. Putting It All Together 

In this section we prove Lemma [22] using the results 
developed in the previous sections. 

Proof of Lemma \22\ Recall that we consider an (l = 3, ir- 
regular ensemble and that < e < e LG " 1B . 

Fix < S < and define 7 = 1 - ±±£. Let a nvdx (j) 

be the constant defined in Theorem [2] Note that a ma x(7) is 
strictly positive since 5 is strictly positive. 

Choose < a < a m ax(7)- Let A"(l,r,a,7) de- 
note the set of graphs {G 6 LDPC(n, l,r) : G e 
(l,r, a, 7) right expander}. From Theorem[2]we know that 

P{G X} = o„(l). (25) 

Let c = c(l, r, e, 8) be the coefficient appearing in Lemma 
[32] and define = 6e ( r "'_. 1 ) r Q- From Lemma [26] we know that 
there exists an iteration I such that 

lim -E[|W(G,E,f)|] < \e 2 . (26) 

Let n(6) be such that for n > n{6), E[|W(G, E, < 9 2 n. 

Using Lemma l28l and splitting the expectation over X and 
its complement, we get 

E[M(G,E,<?)] 

< P{G}n^,w}EE'[M(G,(W,E'),W)] + 

W:\W\<0nG:GeX 

2 P{G}P{^g,w}Ee' [M(G, (W, E'), W)}+ 

W:\W\<8nG:GgX 

On. 



13 



Consider the first term. From Lemma [33] we know that 

E E , [M(G, (W, E'), W)] < a-n + o(n). 



(27) 



Consider the second term. Bound the expectation by n and 
remove the restriction on the size of the witness. This gives 
the bound 

E E nG}p{£ G . w K 

Switch the two summations and use the fact that, for a given 
G, each E realization maps to only one W. We get 

E p w E p { £ ^ = E p w = p { G * 

GiG^A" W:WCC G:G£AT 



o„(l). (28) 
From < f27b and (f28b we conclude that for n > n(9), 

— E[Af(G, E, £)] 



< 



E E P{G}P{£g,w}(«^+o„(1)) 



< 



W:\W\<9nG:GeX 
1 

+ 6c(r- l)r' 
1 1 



a + o„(l). 



r 6c(r — l)r / 
If we now let n tend to infinity then we get 

lim limsupE[P b L<MB (G,e,£)] < lim -E[AT(G, E, f)] 

n— >oo ^^.qo n— »oc 77, 

<.'i 



r 6c(r — l)r 

Since this conclusion is valid for any < a < a max (7) it 
follows that 

lim lim sup E[P b LG!dB (G, e, £)] = 0. ■ 

//. Extensions 

1) GalB and 1 > A: Note that for 1 > 5 the result is 
already implied by Theorem ??. For 1 = 4 the proof is easily 
adapted from the one for 1 = 3. The only difference lies in 
the way the size of the witness is computed (Section IIIl-Db 
and the analysis of the birth-death process (Section [Ill-Fb . 

2) MS and BSC: The proofs can also be extended to other 
decoders. For a given MP decoder, the idea is to define an 
appropriate linearized version of the decoder (LMP) and go 
through the whole machinery as done for GalB. 

For example, consider the MS(Af) decoder and transmission 
over BSC(e). The channel realizations are mapped to {±1}. 
Let M e N, the message alphabet is M = {-M, . . . , M}. For 
transmission of the all-one codeword, the linearized version of 
the decoder (LMS(i\/)) is defined as in Definition [24] i.e., at 
the check node the outgoing message is the minimum of the 
incoming messages and the variable node rule is unchanged. 

One can check that the LMS algorithm defined above is 
monotonic with respect to the input log-likelihoods at both 
the variable and check nodes and the number of errors in the 



MS decoder can be upper bounded by the errors of the LMS 
decoder. 

Lemma 34 (MS(M) Decoder, BSC and 1 > 3): Consider 
(1, r) ensemble and transmission over BSC(e). Let e LMS be the 



(oo) 
M} 



1. If e < e L 



then 



channel parameter below which pV 

lim limsupE[P b MS (G,e,f)] = 

Example 35 (LMS(2) and BSC): Consider communication 
using LDPC(3, 6) code over BSC(e) and decoding using 
MS(2) algorithm. For this setup, the DE threshold is 0.063. 



„(«>) _ 



1 for 



The linearized decoder of this algorithm has p\ 2 y 
e < 0.031. Therefore from the Lemma [34] the limits can be 
exchanged for this e. 

The proof follows by showing results similar to Lemma 
l26l and [33] Here we give a brief explanation for adapting the 
proof to the case of M = 2 and 1 = 3. For a given p > 0, we 
first perform t(p) iterations such that pf_ M m-i] — P- ^ e 
start the marking process from all the edges with messages in 
{—A/, . . . , M — 1} and their witness. In this case the witness 
consists of edges which send messages {— M, ... ,M — 1}. 

To show that the size of the witness is going to zero, 
consider the DE equations similar to those in Appendix [B] 
Let Pg(x) denote a polynomial with non-negative coefficients 
where the coefficient in front of x l denotes the probability that 
the message emitted by a variable node at iteration I is p and 
that the witness (of depth £) for this edge has size i. Let Qj{x) 
denote the equivalent quantity for messages emitted at check 
nodes. Then the DE equations for this augmented system are 
given by: 



ex, 



P i\x) 



ex. 



pt\x) = ex((q+\(x)f + 2q^ 1 (l) q j_ 1 (x))+ 

ex(2 g + 2 1 (l)g7_ 2 1 (x) + 2q+\(x)qi\(x) + (^(x)) 2 ), 
p° e (x) = ex{2qi^{l)qi\{x) + 2 q +\(x)q° e _i) + 

ex(2 (? +! 1 (x) g 7„ 2 1 (x) + 2q^ 1 {x)qi\{x)) 1 
pJ X {x) = ex^qj^x)) 2 + 2qJ 2 1 (x)q° e _ 1 (x)) + 

ex(2q+ 2 1 (l)q~ 2 1 (x) + 2q+\{x)qj\{x) + (<??_! (z)) 2 ), 
pj 2 (x) = ex2(q~ 2 1 (x)(q+\{x) + q°_ x {x) + qj\{x))) 

+ ex(2q t _ 1 (x)qi\(x) + {qj\ {x)f{qj\ (x)) 2 ) 

+ ex(2qj\(x)ql\(x) + (q^x)) 2 ), 



Ptl(l) 



((l- ^ pUY' 1 -(i- E pUT- 1 ) 



--M 



= -M 



Using the hypothesis pfci = 1 and doing a similar analysis 
as in Appendix [B] we can show that the size of the witness 
behaves as oi(l). In the corresponding birth-death process we 
have to keep track of the size of the set of edges with messages 
in {-M,...,M- 1}. 

Similar results can be obtained for BP(M) decoder, and 
channels with continuous outputs. But the analysis of these 
decoders is more complicated because we have to deal with 
densities of messages. 

3) MS(M) and continuous channel: Consider transmission 
through BMS channels with bounded output log-likelihoods 
and decoding using MS(Af) decoder. For this setup it is 
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tempting to conjecture that the proofs can be extended using 
FKG inequalities for continuous lattices [6]. 

IV. Conclusion 

We have shown two approaches for solving the problem 
of limit exchange below the DE threshold. The first one, 
based solely on the expansion property of the graph, helps 
in proving the result for a large class of MP decoders but only 
if the degree is relatively large. To prove the result for smaller 
degrees one has to include the role of channel realizations. 
The second approach accomplishes this in some cases. In 
this paper we only considered channel parameters below the 
DE threshold. But the regime above this threshold is equally 
interesting. One important application of proving the exchange 
of limits in this regime is the finite-length analysis via a scaling 
approach [7] since the computation of the scaling parameters 
heavily depends on the fact that this exchange is permissible. 
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Appendix 

A. Expansion Argument For Block Error Probability 

The following theorem is a modified version of a theorem 
by Burshtein and Miller [5]. 

Theorem 36 (Expansion): Consider an (l,r, a, 7) left ex- 
pander. Assume that < (3 < 1 such that (3(1 — 1) € N and 
that /3^y^ < 27 — 1. If at some iteration i the number of bad 
variable nodes is less than then the MP algorithm will 
decode successfully. 

Proof: Let Bg denote the bad set in iteration £. We claim 

that 

(0 



7 «|B,U6 £+1 | < \N(B e UB t+1 )\ 

■p(l-l)\B t+ i\Bt 



(«) 



(29) 



Step (ii) follows from the fact that each variable in Bi+\\Bi 
must be connected to at least I — (3(1 — 1) checks in the set 
Af(Bi) since otherwise this variable will be good and wont 
be in Bi+\. Therefore the number of edges coming out of 
Bi+i\Be that are not connecting to J\f(Bg) is at most (3(1 — 
l)|£^+i\£>^|. Thus the number of neighbors of Bg+\\Bi that 
are not already neighbors of Bg is at most (3(1 — l)\B^ + i\Bi \. 

Consider now step (i). This step follows in a straightforward 
fashion from the expansion property since by assumption 
\Bl\ < so that \B e U B e+ i\ < an. 

Let T be the set of check nodes that are connected to Bg H 
Bp + i but not connected to Be\Be+i- Suppose an edge from a 
check node in T is carrying a bad message. Then this check 



must be connected to one more variable in Bi C\Bi + i because 
it is not connected to Be\Bg + i and thus cannot get a bad 
message from Bi\Bi + \. For each variable in Bg Pi Be+i, at 
least I — (3(1 — 1) edges must be bad messages and hence it can 
connect to at most (l-(3(l-l))/2+(3(l-l) = l/2+(3(l-l)/2 
check nodes. Therefore we have, 



W(Bz)\ <l\B t \B t+1 \ 
\N(B e )\<l\B e \B l+1 \ 



m 

£-i-f n B*+i . (30) 



Using equations (|29[) and ([30)1 . we get 

7^ +1 U^| <l\B e \B i+1 \ + 



\B e +i n B e 



+ p(l-l)\B i+ i\Bt\ 
j\B i+ i n Bi\ + j\B e \B e+1 \ + j\B l+1 \Bi 



<\B e \B e+1 
J - 1 



— £-HB<+inB< 



(3- 



1 



B e+l \B e 



\Bi + i\Bi\ < (1 J-i lWwl 



7-^ 
l + (3 l -L-2 1 

2(7-/3^) 



B e nB e+1 \ 



The coefficient of the first term in RHS is less than 1 and 
the coefficient of the second term is negative and hence 
\Bi+i\Bi\ < \B t \Bi+i\ M 

B. Size of Witness 

Proof of Lemma [26] Let G be a graph and let E be the 
noise realization. Assume that we perform I iterations. Let 
W e (G,E,£) denote the witness of edge e. Then 

In 

E[|W(G,E,£)|] < ^E[|W e ,(G,E,f)|] = nlE[|W ei (G,E,<)|]. 

It remains to compute the expected size of the witness for 
the limit of n tending to infinity and a fixed I. This can be 
accomplished by DE. 

Let xe denote the probability of an edge being in error 
according to DE. Let pi(x) denote a polynomial with non- 
negative coefficients where the coefficient in front of x % 
denotes the probability that the message emitted by a variable 
node at iteration I is bad and that the witness (of depth £) for 
this edge has size i (i variable nodes). Let qi(x) denote the 
equivalent quantity for messages emitted at check nodes. The 
DE equations for this augmented system are: 

Px(x) = ex, 

Pi(x) = e(2 - qe(l))qe(x)x + eq^x^x, 

q t (x) = P -^§-(l-(l-pt-x(l)f- 1 ). 

The initialization p\(x) = ex reflects the fact that with 
probability e a variable-to-check message is in error in iteration 
1 and that its associated witness of depth 1 consists only of 
the attached variable (hence the x). 
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The recursion for qe(x) is also straightforward. With prob- 
ability 1 — (1 — pf_i(l)) r ~ 1 at least one of the r — 1 
incoming messages at a check node is bad, and in this case 
the distribution of the size of the attached witness is ^— 4ty- 

Let us now look at the recursion for pi{x). There are three 

contributions: (i) Suppose that the variable has a bad received 

value and that exactly one of the incoming edges is bad; this 

happens with probability e2(l — qi(\))qt(l) and in this case 

the distribution of the size of the witness attached to this 

edge is qe ffif , where the extra x accounts for the attached 

variable node, (ii) Suppose that the variable has a bad received 

value and that both incoming edges are bad; this happens with 

probability eqe (l) 2 , and in this case the distribution of the size 

of the witness attached to this edge is q " q ^l\ ■ (iii) Finally, 

suppose that the variable has a good received value and that 

both the incoming edges are bad; this happens with probability 

lqi(l) 2 and in this case the distribution of the size of the 

witness attached to this edge is qe ^f\i . 

6 UK 1 ) 

Note that we get standard DE by setting x = 1, i.e., 
we have xi = Pi(l). We want to show that p' e (l) (this 
is the expected size of the witness in the limit of infinite 
blocklengths) converges to zero as a function of I. 

The augmented DE equation is difficult to handle. So let 
us first write down a scalar version that tracks the expected 

value. Define Be = Q—ikiStWl — ) Then we get 

i- L Pt (i) & 

Pi(x) = e(2 - q l (l))f3 e - 1 pi- 1 (x)x + Zfil^pi-iix) 2 x. 

Differentiate both sides with respect to x. This gives 

p' £ (x) =e/3 e - 1 (2 - qi{l)){p' l _ l {x)x + pe-i{x)) 

+ e/? 2 _i(p<>-i(2:)) 2 + e$-i2Pi-l(x)Pe-i(x)x- 

Now substitute x = 1. Recall that X( = pi{l) and define 
Pi = Pi(i}- Further, bound 2 — qe(l) by 2 and Be by (r — 1). 
This gives the inequality 

pe <2e(r - l)pe~i + 2e(r - l)xe.-i 

+ e(r - l) 2 x 2 e _ 1 + 2e(r - 1) 2 .t £ _i W _i. 

We claim that £xg < p £ . This is true since xg is the probability 
of a bad message, whereas pi is the expected size of the 
witness and the witness size is always at least £ if the message 
is bad. Therefore, 

-^<2e(r-l)+2e(r-l)^i 

Pe-i Pe-i 

+ e( r -l)2^k+2e(r-l) 2 ^_ 1 
Pe-i 

<2e(r - 1) + 2e ^ ~^ + 3e(r - l) 2 a; f _i. 

Now note that x.£ tends to zero since e < g LGalB . Therefore, 
if 2e(r — 1) < 1 then pg/pi_i < 1 for £ sufficiently large. 
The stability condition implies e LGalB < 2 ( r -i) ■ Therefore, for 
e < e haM , pi tends to zero exponentially fast for increasing £. 



C. Randomization 

Proof of Lemma |ZS] We have 

E[M(G, E, £)] 
w 

= ^P{G}E E [A/(G,E,W)1 {W ( G , M)=W} ] 

W,G 

= ^P{G}E E [M(G,E,W)l {E6fGiVv} ]. 

W,G 

For all E £ £ G .w, the channel values on W are fixed to those 
appearing in the witness which is also denoted by W. Recall 
that £' Q w is the projection of £g,w on G\W and E' G £ G w . 
The above expectation is equivalent to 

E E [M (G, (W, E'), W)l {(w , EOe£o , w} ] - 
P(W)E E 4M(G,(W,E'),>V)l {E , G ^ vv} ] ! 

where P(W) is the probability of the channel values on W. 
This implies ¥{W)f{£' Q VV ) = P(£ G ,w)- Using d20> we bound 

E & [M(G,(W,E'),W)l {We s^ w} ] 
<P(^ iVV )E E 4M(G,(W,E'),W)]. 

Therefore, 

E[M(G,E,£)] 

< £ P{G}P{£ G , W }E E , [M (G, (W, E'), W)] 

W,G 

< J2 P{G}P{f G ,w}E E 4A/(G,(W,E'),W)] + 

W:\W\<0n,G 

J2 P{G}P{£ G , W }E E 4M(G,(W,E'),W)]. 

W:|W|>6»n,G 

Consider the second term in the last line. Bound the expecta- 
tion by n. This yields 

P{G}P{£ G ,wH 

W:\W\>0n,G 

If W % G, then £ G .yv is empty. Therefore the above bound is 
equivalent to 

n E [l{wc G }l{ E e£ G , w }] 

W:\W\>8n 

= n 2J e [1{w(g,e,<?)=w}] 

W:|W|>6»Tl 

= riP{|W(G,E,£)| > On}. 

By assumption, E[|W(G, E, £)\] < 9 2 n. The Markov inequality 
therefore shows that 

P{|W(G,E,£)| > On} < e. m 
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D. FKG Inequality 

Consider the Hamming space {0, 1}™. For x,y € {0, 1}™ 
define the following partial order: x < y iff x\ < yi for all i. 
Define x< as 



x< = {y-ye {0, l}", y < x}, 

and iVj/ and x A y as 

/ ii \ / if Xi = y t = 0, 
( XV ^= 1 else, 



(x A y)i 



1 if Xi = yi = 1, 
else. 



(31) 



(32) 



(33) 



We say that a function / : {0, 1}™ — > R is monotonically 
increasing (decreasing) if f(x) > f(y) whenever x > y (x < 

yl 

Lemma 37 (FKG Inequality - [8]): Let P{-} be a proba- 
bility measure on {0, 1}™ such that 

P{x}P{y} < P{x V y}P{x A y}. 

Let / and g be real- valued non-negative functions on {0, 1}". 
If / and g are either both monotonically increasing or both 
decreasing then 

E[f(x)g(y)} > E[f(x)}E[g(y)}. 



E. Birth and Death Process 

Consider the following birth and death process. We start 
with X = a > 0. At step t, t G N, if X t _ x < 1 then we stop 
the process and define X t > = X t i-\ for t' > t. Otherwise we 
decrease X t -\ by 1 and add Y t , where the sequence {Yt}t>i 
is iid. In this way, as long as X t -\ > 1, 

X t = X t _ 1 -l + Y t . 

This process is equivalent to the standard birth and death 
process if Y t takes non-negative integer values. In this case, 
the step described above corresponds to choosing a member 
of the population which then creates Y t off-springs and dies. 

Let T denote the stopping time, i.e., T = min{i : X t < 1}. 

Lemma 38 (Birth-Death): Fix p G (0, 1] and < fi < 1. 
Consider a birth and death process with Xq = a G N and 



Yi = 



, with probability p, 



I 0, with probability 1 — p, 
so that E[Yi\ = fi. Then, for /3a G N, 

P{T > f3a} < e~ ac{ P-^ 

where c(p, (i, /3) > for > 

Proof: Let b = f3a. Note that 

P{T > 6} < F{X b > 1} < P{X b > 0}. 

Let Y t = Y t - 1. We have 

6 

X b =X b _ 1 +Y b = AV 2 + Yb-i +Y b = a + ^Yi. 

i=i 



Therefore, 

b 

F{T > b} < p{Y^ 



Y>-ar= 



}^°P{« 



> e~ as ) 



< 



(1 — p)e + pe y p ' 



First consider the case a > p. Set s = In ^, l,^ 1 ^ , which 
is strictly positive since H>p and (3 > jzt^- Set /3 = 
where ^ > 0. With this choice we get 

p{r>6}<[ ^- p) ( ^-p^-Zp ) E ^' 

For ^ = the terms inside the square brackets is 1. If we take 
the derivative of the expression inside the square brackets wit 
to £ we get 

-p ( +m-p)\ i - tSe F & log + o(i - p) 



Foi ^ > and fi > p this is strictly negative which proves oui 
claim. 

Now considei the case p, < p. Foi < (3 < -^-^ the 
above still applies. Foi (3 > the probability is 0. This is 
because in each step we can add at most - — 1. Therefore, foi 
t > -2-0+1, X t < a+ f-2-a + l)(^ - 1) < 0. ■ 

Concentration 

Theorem 39 (Concentration Theorem [l][p. 222]): Let G, 
chosen unifoimly at landom from LDPC(n, A, p), be used 
for transmission over a BMS(e) channel. Assume that the 
decoder performs £ rounds of message-passing decoding and 
let P b Me (G, e, £) denote the resulting bit error probability. Then, 
for any given <5 > 0, there exists an a > 0, a = a(X, p, S), 
such that 

P{|Pr(W)-E L DPC( ra ,A,p) [n MP (G,^)] \>S}< e~ an . 
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